Skip to content

Conversation

@thanhlecongg
Copy link

Fix display of string categories without quotes as reported in Issue #63045 by adding a check for categories with dtype 'string'.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Categorical
^^^^^^^^^^^
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
- Bug in :class:`pandas.Categorical` displaying string categories without quotes when constructed from a Series with dtype "string" (:issue:`63045`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Bug in :class:`pandas.Categorical` displaying string categories without quotes when constructed from a Series with dtype "string" (:issue:`63045`)
- Bug in :class:`pandas.Categorical` displaying string categories without quotes when using "string" dtype (:issue:`63045`)

It is not so much the issue that the Categorical was created from a Series, but that it is using the string dtype for its categories (you can construct the same categorical in other ways as well)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I have updated this doc accordingly.

expected = "[1, '2', 3, 4]\nCategories (4, object): [1, 3, 4, '2']"
assert result == expected

def test_categorical_with_pandas_series(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_categorical_with_pandas_series(self):
def test_categorical_with_string_dtype(self):

Comment on lines 549 to 551
def test_categorical_with_pandas_series(self):
# GH 63045
s = Series(["apple", "banana", "cherry", "cherry"], dtype="string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_categorical_with_pandas_series(self):
# GH 63045
s = Series(["apple", "banana", "cherry", "cherry"], dtype="string")
def test_categorical_with_pandas_series(self, string_dtype_no_object):
# GH 63045
s = Series(["apple", "banana", "cherry", "cherry"], dtype=string_dtype_no_object)

You could maybe use here this fixture that will test it for the different string dtype variations, to make sure we now do this consistently for all string like dtypes.

The only thing you will have to update is the "string" in the expected result below (you can include str(string_dtype_no_object) in the expected value)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. I have updated this test accordingly.

@thanhlecongg
Copy link
Author

Some tests failed, but I don’t think it’s because of my latest commit since it only changed the test case and doc. I also ran the failed tests locally and got xfail results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: String categories are not quoted as expected

2 participants